Wed AM1.L5: Multimodal Learning for Audio and Language
Wed, 6 Sep, 10:30 - 12:30 Finland Time (UTC +3)
Location: Press room
Session Type: Lecture
Session Chair: Xubo Liu, University of Surrey
Track: Special Sessions
Wed, 6 Sep, 10:30 - 10:50 Finland Time (UTC +3)

Wed AM1.L5.1: KNOWLEDGE DISTILLATION FOR EFFICIENT AUDIO-VISUAL VIDEO CAPTIONING

Özkan Çaylı, Izmir Katip Çelebi University, Turkey; Xubo Liu, University of Surrey, United Kingdom; Volkan Kılıç, Izmir Katip Çelebi University, Turkey; Wenwu Wang, University of Surrey, Turkey
Wed, 6 Sep, 10:50 - 11:10 Finland Time (UTC +3)

Wed AM1.L5.2: ATTENTION-BASED METHODS FOR AUDIO QUESTION ANSWERING

Parthasaarathy Ariyakulam Sudarsanam, Tuomas Virtanen, Tampere University, Finland
Wed, 6 Sep, 11:10 - 11:30 Finland Time (UTC +3)

Wed AM1.L5.3: ENHANCING AUDIO RETRIEVAL WITH ATTENTION-BASED ENCODER FOR AUDIO FEATURE REPRESENTATION

Feiyang Xiao, Harbin Engineering University, China; Qiaoxi Zhu, University of Technology Sydney, Australia; Jian Guan, Harbin Engineering University, China; Wenwu Wang, University of Surrey, United Kingdom
Wed, 6 Sep, 11:30 - 11:50 Finland Time (UTC +3)

Wed AM1.L5.4: Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer

Etienne Labbé, Julien Pinquier, Thomas Pellegrini, IRIT, France
Wed, 6 Sep, 11:50 - 12:10 Finland Time (UTC +3)

Wed AM1.L5.5: Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study

Yi Yuan, Haohe Liu, University of Surrey, United Kingdom; Jinhua Liang, Queen Mary University of London, United Kingdom; Xubo Liu, Mark D. Plumbley, Wenwu Wang, University of Surrey, United Kingdom
Wed, 6 Sep, 12:10 - 12:30 Finland Time (UTC +3)

Wed AM1.L5.6: ACES: EVALUATING AUTOMATED AUDIO CAPTIONING MODELS ON THE SEMANTICS OF SOUNDS

Gijs Wijngaard, Elia Formisano, Maastricht University, Netherlands; Bruno Giordano, CNRS and Université Aix-Marseille, Netherlands; Michel Dumontier, Maastricht University, Netherlands